Goto

Collaborating Authors

 wolfe condition



Bilevel Optimization for Real-Time Control with Application to Locomotion Gait Generation

arXiv.org Artificial Intelligence

Model Predictive Control (MPC) is a common tool for the control of nonlinear, real-world systems, such as legged robots. However, solving MPC quickly enough to enable its use in real-time is often challenging. One common solution is given by real-time iterations, which does not solve the MPC problem to convergence, but rather close enough to give an approximate solution. In this paper, we extend this idea to a bilevel control framework where a "high-level" optimization program modifies a controller parameter of a "low-level" MPC problem which generates the control inputs and desired state trajectory. We propose an algorithm to iterate on this bilevel program in real-time and provide conditions for its convergence and improvements in stability. We then demonstrate the efficacy of this algorithm by applying it to a quadrupedal robot where the high-level problem optimizes a contact schedule in real-time. We show through simulation that the algorithm can yield improvements in disturbance rejection and optimality, while creating qualitatively new gaits.


Probabilistic Line Searches for Stochastic Optimization

Neural Information Processing Systems

In deterministic optimization, line searches are a standard tool ensuring stability and efficiency. Where only stochastic gradients are available, no direct equivalent has so far been formulated, because uncertain gradients do not allow for a strict sequence of decisions collapsing the search space. We construct a probabilistic line search by combining the structure of existing deterministic methods with notions from Bayesian optimization. Our method retains a Gaussian process surrogate of the univariate optimization objective, and uses a probabilistic belief over the Wolfe conditions to monitor the descent. The algorithm has very low computational cost, and no user-controlled parameters. Experiments show that it effectively removes the need to define a learning rate for stochastic gradient descent.


Learning-Rate-Free Learning: Dissecting D-Adaptation and Probabilistic Line Search

arXiv.org Artificial Intelligence

This report investigates the problem of learning rate optimisation, focusing on techniques that remove the programmer's burden to choose a proper initial learning rate. The report aims to satisfy two purposes: 1. Acting as an intuition-led guide to Defazio and Mishchenko's 2023 Learning-Rate-Free Learning by D-Adaptation [2] and Mahsereci and Hennig's 2015 Probabilistic Line Searches for Stochastic Optimisation [5]. 2. Presenting a unified notation to discuss optimisation techniques, allowing us to bring together the two learning-rate-free approaches and introduce probabilistics to D-Adaptation in the Discussion section (4). We will begin by recapping the general problem of optimisation. This will establish a common language through which to discuss optimisation algorithms, and introduce the notation used in Defazio et al's D-Adaptation paper.


A fast quasi-Newton-type method for large-scale stochastic optimisation

arXiv.org Machine Learning

During recent years there has been an increased interest in stochastic adaptations of limited memory quasi-Newton methods, which compared to pure gradient-based routines can improve the convergence by incorporating second order information. In this work we propose a direct least-squares approach conceptually similar to the limited memory quasi-Newton methods, but that computes the search direction in a slightly different way. This is achieved in a fast and numerically robust manner by maintaining a Cholesky factor of low dimension. This is combined with a stochastic line search relying upon fulfilment of the Wolfe condition in a backtracking manner, where the step length is adaptively modified with respect to the optimisation progress. We support our new algorithm by providing several theoretical results guaranteeing its performance. The performance is demonstrated on real-world benchmark problems which shows improved results in comparison with already established methods.


Probabilistic Line Searches for Stochastic Optimization

arXiv.org Machine Learning

In deterministic optimization, line searches are a standard tool ensuring stability and efficiency. Where only stochastic gradients are available, no direct equivalent has so far been formulated, because uncertain gradients do not allow for a strict sequence of decisions collapsing the search space. We construct a probabilistic line search by combining the structure of existing deterministic methods with notions from Bayesian optimization. Our method retains a Gaussian process surrogate of the univariate optimization objective, and uses a probabilistic belief over the Wolfe conditions to monitor the descent. The algorithm has very low computational cost, and no user-controlled parameters. Experiments show that it effectively removes the need to define a learning rate for stochastic gradient descent.


Probabilistic Line Searches for Stochastic Optimization

arXiv.org Machine Learning

In deterministic optimization, line searches are a standard tool ensuring stability and efficiency. Where only stochastic gradients are available, no direct equivalent has so far been formulated, because uncertain gradients do not allow for a strict sequence of decisions collapsing the search space. We construct a probabilistic line search by combining the structure of existing deterministic methods with notions from Bayesian optimization. Our method retains a Gaussian process surrogate of the univariate optimization objective, and uses a probabilistic belief over the Wolfe conditions to monitor the descent. The algorithm has very low computational cost, and no user-controlled parameters. Experiments show that it effectively removes the need to define a learning rate for stochastic gradient descent.


Probabilistic Line Searches for Stochastic Optimization

Neural Information Processing Systems

In deterministic optimization, line searches are a standard tool ensuring stability and efficiency. Where only stochastic gradients are available, no direct equivalent has so far been formulated, because uncertain gradients do not allow for a strict sequence of decisions collapsing the search space. We construct a probabilistic line search by combining the structure of existing deterministic methods with notions from Bayesian optimization. Our method retains a Gaussian process surrogate of the univariate optimization objective, and uses a probabilistic belief over the Wolfe conditions to monitor the descent. The algorithm has very low computational cost, and no user-controlled parameters. Experiments show that it effectively removes the need to define a learning rate for stochastic gradient descent.


A Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning

arXiv.org Machine Learning

We extend the well-known BFGS quasi-Newton method and its memory-limited variant LBFGS to the optimization of nonsmooth convex objectives. This is done in a rigorous fashion by generalizing three components of BFGS to subdifferentials: the local quadratic model, the identification of a descent direction, and the Wolfe line search conditions. We prove that under some technical conditions, the resulting subBFGS algorithm is globally convergent in objective function value. We apply its memory-limited variant (subLBFGS) to L_2-regularized risk minimization with the binary hinge loss. To extend our algorithm to the multiclass and multilabel settings, we develop a new, efficient, exact line search algorithm. We prove its worst-case time complexity bounds, and show that our line search can also be used to extend a recently developed bundle method to the multiclass and multilabel settings. We also apply the direction-finding component of our algorithm to L_1-regularized risk minimization with logistic loss. In all these contexts our methods perform comparable to or better than specialized state-of-the-art solvers on a number of publicly available datasets. An open source implementation of our algorithms is freely available.